Using Finite State Technology in Natural Language Processing of Basque
نویسندگان
چکیده
This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morphological analyser/generator (Alegria et al., 96) and a spelling checker/corrector for Basque named Xuxen (Aldezabal et al., 99). The analyser is a basic tool for current and future work on NLP of Basque, such as the lemmatiser/tagger Euslem (Ezeiza et al., 98), an Intranet search engine (Aizpurua et al., 00) or an assistant for verse-making (Arrieta et al., 00).
منابع مشابه
Strengths and weaknesses of finite-state technology: a case study in morphological grammar development
Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which ...
متن کاملNumerical Simulation of a Lead-Acid Battery Discharge Process using a Developed Framework on Graphic Processing Units
In the present work, a framework is developed for implementation of finite difference schemes on Graphic Processing Units (GPU). The framework is developed using the CUDA language and C++ template meta-programming techniques. The framework is also applicable for other numerical methods which can be represented similar to finite difference schemes such as finite volume methods on structured grid...
متن کاملStandard Arabic formalization and linguistic platform for its analysis
From the beginning of the sixties, and starting with the first automatic analyzer proposed by David Cohen, one of the first theorists of NLP [1], research has continued with natural language processing and especially the automatic treatment of the Arabic language. In 1983, with a minimalist morphological analysis, based on the theory that any Arabic form is generated using root and pattern, res...
متن کاملUsing foma for language-based games
This paper describes two examples of how finite-state technology (FST) commonly used in computational morphology can help implement language-based games. The tool we have used is foma an open-source toolkit, similar to previous Xerox/PARC finite-state tools. FST tools have been widely used to describe the morphology of languages and to implement spelling checkers and correctors, especially for ...
متن کاملNatural Language Processing for Improving Textual Accessibility ( NLP 4 ITA ) Workshop Programme
Analysis of long sentences are source of problems in advanced applications such as machine translation. With the aim of solving these problems in advanced applications, we have analysed long sentences of two corpora written in Standard Basque in order to make syntactic simplification. The result of this analysis leads us to design a proposal to produce shorter sentences out of long ones. In ord...
متن کامل